Skip to content

Conversation

@shaneahmed
Copy link
Member

@shaneahmed shaneahmed commented Mar 31, 2023

  • Improve Engines performance and implementation
  • Redesigns PatchPredictor engine using the new EngineABC base class.
  • The WSIs are now processed using the same code as for the processing the patches using WSI based dataloader.
  • The intermediate output is saved as zarr for the WSIs to resolve memory issues.
  • The output of model architectures should now be a dictionary.
  • The output can be specified as AnnotationStore for visualisation using TIAViz.
  • Fix mypy Type Checks for cli/common.py
  • Redesigns PatchPredictor engine using the new EngineABC base class.
  • The WSIs are now processed using the same code as for the processing the patches using WSI based dataloader.
  • The intermediate output is saved as zarr for the WSIs to resolve memory issues.
  • The output of model architectures should now be a dictionary.
  • The output can be specified as AnnotationStore for visualisation using TIAViz.
  • Add PatchPredictor Engine based on EngineABC
  • Add return_probabilities option to Params
  • Removes merge_predictions option in PatchPredictor engine.
  • Defines post_process_cache_mode which allows running the algorithm on WSI
  • Add infer_wsi for WSI inference
  • Removes save_wsi_output as this is not required after post processing.
  • Removes merge_predictions and fixes docstring in EngineABCRunParams
  • compile_model is now moved to EngineABC init
  • Fixes bug with _calculate_scale_factor
  • Fixes a bug in class_dict definition.
  • _get_zarr_array is now a public function get_zarr_array in misc
  • patch_predictions_as_annotations runs the loop on patch_coords instead of class_probs

@shaneahmed shaneahmed self-assigned this Mar 31, 2023
@shaneahmed shaneahmed added the enhancement New feature or request label Mar 31, 2023
@codecov
Copy link

codecov bot commented Mar 31, 2023

Codecov Report

❌ Patch coverage is 93.62445% with 73 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.07%. Comparing base (ce25587) to head (80e7af5).

Files with missing lines Patch % Lines
tiatoolbox/models/dataset/dataset_abc.py 73.97% 38 Missing ⚠️
tiatoolbox/models/engine/io_config.py 56.75% 32 Missing ⚠️
tiatoolbox/cli/nucleus_instance_segment.py 66.66% 1 Missing ⚠️
...iatoolbox/models/architecture/timm_efficientnet.py 99.19% 0 Missing and 1 partial ⚠️
tiatoolbox/utils/misc.py 97.77% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #578      +/-   ##
===========================================
- Coverage    99.27%   95.07%   -4.20%     
===========================================
  Files           71       77       +6     
  Lines         9161     9674     +513     
  Branches      1195     1253      +58     
===========================================
+ Hits          9095     9198     +103     
- Misses          40      440     +400     
- Partials        26       36      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Refactor engines_abc.py
@shaneahmed shaneahmed changed the title ⚡ Improve Engines Performance and Implementation ⚡ Improve Engine Performance and Implementation Apr 28, 2023
shaneahmed and others added 30 commits June 9, 2025 12:29
# Conflicts:
#	tests/models/test_feature_extractor.py
#	tiatoolbox/models/models_abc.py
# Conflicts:
#	tiatoolbox/cli/common.py
#	tiatoolbox/cli/nucleus_instance_segment.py
#	tiatoolbox/cli/patch_predictor.py
#	tiatoolbox/models/engine/semantic_segmentor.py
* ⚡ Make WSIPatchDataset Pickleable to Support Windows Multithreading (#947)

This PR makes the WSIPatchDataset class picklable by delaying the creation of the reader object until the first call to `__getitem__`. This enables the use of multiple loader workers on Windows without errors and provides significant performance improvements.

- Delays reader object instantiation to the first `__getitem__` call instead of during initialization
- Extracts reader creation logic into a separate `_get_reader` method
- Stores image path and mode as instance variables for lazy initialization

Speedup for the WSI prediction cell of the patch_prediction example notebook: 
2min 48 sec with 0 loader workers -> 1min 13 sec with 4 workers.

Note: this PR doesn't have any effect for Linux as the multi-threading already works fine there because Linux multithreading doesn't require things to be pickleable

* 🔀 Merge branch develop into dev-engine-abc

* 🐛 Fix reader_info read

---------

Co-authored-by: Mark Eastwood <[email protected]>
# Conflicts:
#	tiatoolbox/models/dataset/classification.py
# Conflicts:
#	tests/models/test_patch_predictor.py
# Conflicts:
#	tests/models/test_feature_extractor.py
#	tests/models/test_multi_task_segmentor.py
#	tests/models/test_nucleus_instance_segmentor.py
#	tests/models/test_patch_predictor.py
#	tests/models/test_semantic_segmentation.py
#	tiatoolbox/models/architecture/__init__.py
## Summary of Changes

### Major Additions
- **Dask Integration:**  
  - Added `dask` as a dependency and integrated Dask arrays and lazy computation throughout the engine and patch predictor code.
  - Added Dask-based merging, chunking, and memory-aware processing for large images and WSIs.

- **Zarr Output Support:**  
  - Added support for saving model predictions and intermediate results directly to Zarr format.
  - New CLI options and internal logic for Zarr output, including memory thresholding and chunked writes.

- **SemanticSegmentor Engine:**  
  - Added a new `SemanticSegmentor` engine with Dask/Zarr support and new test coverage (`test_semantic_segmentor.py`).
  - Added CLI entrypoint for `semantic_segmentor` and removed the old `semantic_segment` CLI.

- **Enhanced CLI and Config:**  
  - Added CLI options for memory threshold, unified worker options, and improved mask handling.
  - Updated YAML configs and sample data for new models and test images.

- **Utilities and Validation:**  
  - Added utility functions for minimal dtype casting, patch/stride validation, and improved error handling (e.g., `DimensionMismatchError`).
  - Improved annotation store conversion for Dask arrays and Zarr-backed outputs.

- **Changes to `kwarg`**
  - Add `memory-threshold`
  - Unified `num-loader-workers` and `num-postproc-workers` into `num-workers`
  - Removed `cache_mode` as cache mode is automatically handled.

---

### Major Removals/Refactors
- **Removed Old CLI and Redundant Code:**  
  - Deleted the old `semantic_segment.py` CLI and replaced it with `semantic_segmentor.py`.
  - Removed legacy cache mode and patch prediction Zarr store tests.

- **Refactored Model and Dataset APIs:**  
  - Unified and simplified model inference APIs to always return arrays (not dicts) for batch outputs.
  - Refactored dataset classes to enforce patch shape validation and remove legacy “mode” logic.

- **Test Cleanup:**  
  - Removed or updated tests that relied on old APIs or cache mode.
  - Refactored test assertions for new output types and Dask array handling.

- **API Consistency:**  
  - Standardized function and argument names across engines, CLI, and utility modules.
  - Updated docstrings and type hints for clarity and consistency.

---

### Notable File Changes
- **New:**  
  - `tiatoolbox/cli/semantic_segmentor.py`
  - `tests/engines/test_semantic_segmentor.py`

- **Removed:**  
  - `tiatoolbox/cli/semantic_segment.py`
  - Old cache mode and patch Zarr store tests

- **Heavily Modified:**  
  - `engine_abc.py`, `patch_predictor.py`, `semantic_segmentor.py`
  - CLI modules and test suites
  - Dataset and utility modules for Dask/Zarr compatibility

---

### Impact

- Enables scalable, parallel, and memory-efficient inference and output saving for large images.
- Simplifies downstream analysis by supporting Zarr as a native output format.
- Lays the groundwork for further Dask-based optimizations in TIAToolbox.


---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
## 🚀Summary
This PR introduces a new **[GrandQC Tissue Detection Model](https://github.com/cpath-ukk/grandqc/tree/main)** for digital pathology quality control and integrates **EfficientNet-based encoder architecture** into the TIAToolbox framework.

---

## ✨Key Changes
- **New Model Architecture**
  - Added `grandqc.py` implementing a UNet++ decoder with EfficientNet encoder for tissue segmentation.
  - Includes preprocessing (JPEG compression + ImageNet normalization), postprocessing (argmin-based mask generation), and batch inference utilities.
- **EfficientNet Encoder**
  - Added `timm_efficientnet.py` providing configurable EfficientNet encoders with dilation support and custom input channels.
- **Pretrained Model Config**
  - Updated `pretrained_model.yaml` to register `grandqc_tissue_detection_mpp10` with associated IO configuration.
  - Corrected `IOSegmentorConfig` references and adjusted resolutions for SCCNN models.
- **Testing**
  - Added comprehensive unit tests for:
    - `GrandQCModel` functionality, preprocessing/postprocessing, and decoder blocks.
    - EfficientNet encoder utilities and scaling logic.
  
## Impact
- Enables high-resolution tissue detection for WSI quality control using state-of-the-art architectures.
- Improves flexibility for segmentation tasks with EfficientNet encoders.
- Enhances code quality and consistency through updated linting and formatting tools.


## Tasks
- [x] Re-host GrandQC model weights on TIA Hugging Face
- [x] Update `pretrained_model.yaml`
- [x] Update `requirements.txt`
- [x] Define GrandQC model architecture
- [x] Add example usage
- [x] Remove segmentation-models-pytorch dependency
- [x] Wait for response from GrandQC authors
- [x] Add tests
- [x] Tidy up

---------

Co-authored-by: Shan E Ahmed Raza <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
# 🚀 Summary

This PR introduces a new **`DeepFeatureExtractor` engine** to the TIAToolbox framework, enabling extraction of intermediate CNN feature representations from whole slide images (WSIs) or image patches. These features can be used for downstream tasks such as clustering, visualization, or training other models. The update also includes:

- A **command-line interface (CLI)** for the new engine.
- Extended **CLI utilities** for flexible input/output configurations.
- Comprehensive **unit tests** covering patch-based and WSI-based workflows, multi-GPU support, and CLI functionality.
- Integration with TIAToolbox’s model registry and CLI ecosystem.

---

## ✨ Key Features

### **New Engine: `DeepFeatureExtractor`**
- Extracts intermediate CNN features from WSIs or patches.
- Outputs feature embeddings and spatial coordinates in **Zarr** or **dict** format.
- Implements **memory-aware caching** for large-scale WSI processing.
- Compatible with:
  - TIAToolbox pretrained models.
  - Torchvision CNN backbones (e.g., ResNet, DenseNet, MobileNet).
  - **All timm architectures via `timm.list_models()`**, including HuggingFace-hosted models.
- Supports both **patch-mode** and **WSI-mode** workflows.

### **CLI Integration**
- Adds `deep-feature-extractor` command to TIAToolbox CLI.
- Supports options for:
  - Input/output paths and file types.
  - Model selection (`resnet18`, `efficientnet_b0`, timm-based backbones, etc.).
  - Patch extraction parameters (`patch_input_shape`, `stride_shape`, `input_resolutions`).
  - Batch size, device selection, memory threshold, overwrite behavior.
- Flexible JSON-based CLI options for resolutions and class mappings.

### **Extended CLI Utilities**
- New reusable options:
  - `--input-resolutions`, `--output-resolutions` (JSON list of dicts).
  - `--patch-input-shape`, `--stride-shape`, `--scale-factor`.
  - `--class-dict` for mapping class indices to names.
  - `--overwrite` and `--output-file` for fine-grained control.

### **Unit Tests**
- **Engine Tests**:
  - Patch-based and WSI-based feature extraction.
  - Validation of Zarr outputs (features and coordinates).
  - Multi-GPU functionality.
- **Model Compatibility**:
  - Tests with `CNNBackbone` and `TimmBackbone` models.
- **CLI Tests**:
  - Single-file and parameterized runs.
  - Validation of JSON parsing for CLI options.

### **Codebase Integration**
- Registers `DeepFeatureExtractor` in `tiatoolbox.models` and engine registry.
- Adds CLI command in `tiatoolbox.cli.__init__.py`.
- Updates architecture utilities to support timm-based backbones and HuggingFace models.
- Introduces dictionaries for Torch and timm backbones (`torch_cnn_backbone_dict`, `timm_arch_dict`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment